-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: The great split #353
base: main
Are you sure you want to change the base?
WIP: The great split #353
Conversation
Signed-off-by: Sebastian Hoß <[email protected]>
This sounds awesome @sebhoss!
I think that's a good idea - it seems sensible, and can reduce the work the compiler has to do as well? On that note, this should significantly reduce compilation time for this crate. Even with only a couple of features selected, it was pretty heavyweight! A couple of notes:
dependencies.workspace = true In the custom resource TOML files, then add: [dependencies]
schemars = { version = "~0" }
serde = { version = "~1" }
serde_json = { version = "~1" }
k8s-openapi = { version = "~0" }
kube = { version = "~0", features = ["derive"] } To the workspace Cargo toml file. If that doesn't work, you can do it individually: [dependencies]
schemars = { workspace = true }
serde = { workspace = true }
... You can also do the same for package attributes: [package]
authors.workspace = true
description.workspace = true
... This would prevent the need to re-generate code when any of the workspace metadata changes? |
When it comes to releasing the crates, is this really an issue? We won't be releasing all the crates every time, only when the code within changes? If so, then this issue only applies to the first release - and we can mitigate that by just initially releasing slowly. |
Last comment/idea, sorry for the spam! Rather than version/release each crate individually and per resource, we could group them by namespace instead? I.e rather than have ~40ish From a UX point of view, you're more likely to use CRDs in these "groups", so you probably want all of the AWS resources available to you. This could be grouped by the last two parts of the CRD? So sqs.services.k8s.aws would become But maybe this isn't always safe, idk. At the very least it might make sense for some specific hard-coded list of groups? AWS in particular? |
Thanks for the feedback - I'll adjust the workspace related stuff as per your comment in my next push ❤️ Wrt. releasing crates: I'm not sure if that will be an issue but I don't want to be a bad citizen on crates.io so spreading releases out a bit won't hurt? My current implementation just calculates the hash of the group name and then mods them with 24 and 60 to get hours and minutes. I like the idea of grouping crates even further to reduce the total number of crates! Not sure if we can use the last two segments of their group though, e.g. https://github.com/metio/kube-custom-resources-rs/tree/split/crd-catalog/aws sometimes uses If we go down this road, how would this work with features though? Would we have one feature for each version in each group in each project? Using https://github.com/metio/kube-custom-resources-rs/tree/split/crd-catalog/aws/amazon-cloudwatch-agent-operator/cloudwatch.aws.amazon.com/v1alpha1 as an example, would we have a feature called |
Hmm, on second thought perhaps it's best to not group them. It would work for larger groups of packages, but for smaller ones it would be confusing to add a package then enable a feature with the same name? It would also increase update churn - you're not going to be sure your specific CRDs are updated when you update a package, whereas with a single package per CRD you can be? |
Signed-off-by: Sebastian Hoß <[email protected]>
Signed-off-by: Sebastian Hoß <[email protected]>
Hey! I think this is a good idea also. Less mega crates that does everything. Everywhere I go I end up inlining My expectation on load is that given how you've set it up (only publish when there are commits touching the directory), it'll likely end being less load altogether on crates.io as there will be less stuff released, but maybe there's some common overhead you incur. Definitely see much bigger offenders in the crates ecosystem in the build queue every now and then though (like repos releasing 100+ crates at every update unconditionally; those seem to get downprioritised in the queue). Maybe the calendar versioning (masquerading as semver) that you inject with |
Hey all - thanks for the feedback & sorry for the delay here. I was busy moving to another city and life I guess 🤷 I agree with the package/feature name duplication. It doesn't really feel right using this, so I've switched back to one crate per custom resource group and one feature for each version within that group. I've added a default feature as well which enables all versions because the crates.io UI has this quick copy&paste box that does not contain any features. My thinking here was that people should be good to go by just pasting the string they got from crates.io into their project while still allowing them to fine-tune the enabled features later. The crate length limit should be fine (at least right now). The maximum is 64 chars and we are at 43 with the @clux can you clarify what a version number would look like in your idea? Would we always increment the major version and set minor/path to 0? So like going from 1.0.0 to 2.0.0 to 3.0.0 on each release? My original idea behind the calendar based versioning was to give some meaning to the version number since users can quickly see how old their CRDs are. However, I agree that the current versioning scheme has no meaning for auto-merging workflows that expect semver based behavior. I've replaced the sed call with About releasing: How about we do not release to crates.io at all and instead tell users to declare a git dependency? I think the advantages are:
Can't really think about strong downsides. Maybe it feels weird because every other crate is fetched from crates.io? |
Signed-off-by: Sebastian Hoß <[email protected]>
I would imagine making minor releases automatically (1.X++.0), and reserving majors for something unexpected that you actually want people to look at (if it should occur). People can already see how old things are from crates.io last publish or github release/tag dates. To be clear, I like the intention of giving at-a-glance information, but I don't like underming an existing system with existing conventions (semver, where majors should be looked at) to present it.
This does mean people have to whitelist git dependencies in tools such as cargo-deny. It's occasionally a tricky thing to get buy-in for in some security focused organisations because people can (and do) overwrite git history. A crates registry pin is also more permanent (e.g. survives yanks). |
Ok thanks for clarifying! |
Another major downside of using git dependencies is that you’re unable to publish crates with git dependencies onto crates.io. Regarding the versioning, I’m of two minds. On one hand, as a user, I did value the date based (calver?) approach, and I’m also not entirely sure I’d want my CRDs updated “under my feet”, so to speak. I never experienced this personally, but it seems like it’s possible that existing CRD definitions can change in a way that causes my code to either break or refuse to compile. This would not be semver compatible. And we’re not actually versioning Rust code here, not really, we’re versioning collections of very loosely versioned external “things” that don’t abide by server at all. This leads to two possible options:
I’m not sure what is best: both have advantages and disadvantages, but it’s not as clear cut as just “use semver”. |
Yeah I think there is no clear best way to version the crates in this repo. I basically outsourced that problem to the upstream projects in my original design, e.g. if they introduce a breaking change, they would have to increase the version of their CRD which would result in an entirely new rust struct to be generated while still keeping the old struct around. Users would then have a choice to update their code to the latest CRD version or keep using the old one as long as they like. However updating versions of CRDs seems to be mostly done by humans and therefore errors are to be expected. Again I kinda outsourced that problem to upstream projects by just ignoring it in this repo and using calver to have an increasing version number and a simplified release process 😬 Additionally, there is still #289 around which describes a breakage introduced by updating dependencies. I've been toying around with cargo-release, cargo-workspaces and how we could version/release this repo for a bit now and maybe there is at least something we can improve compared to the original design. My proposal is to use 1.X.Y as a version number with X being the output of The release process would basically stay the same: We spread the publication of crates in the repo throughout a day to combat limits like pksunkara/cargo-workspaces#177 and do not commit versions back into git. Instead all crates get a 0.0.0 version number which will be incremented on the fly with |
Signed-off-by: Sebastian Hoß <[email protected]>
Signed-off-by: Sebastian Hoß <[email protected]>
Signed-off-by: Sebastian Hoß <[email protected]>
serde_yml seems to be AI maintained and runs into an error like this: Downloading https://raw.githubusercontent.com/jenkinsci/kubernetes-operator/master/config/crd/bases/jenkins.io_jenkins.yaml thread 'main' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/libyml-0.0.5/src/scanner.rs:2798:17: String join would overflow memory bounds stack backtrace: 0: rust_begin_unwind at /rustc/e71f9a9a98b0faf423844bf0ba7438f29dc27d58/library/std/src/panicking.rs:665:5 1: core::panicking::panic_fmt at /rustc/e71f9a9a98b0faf423844bf0ba7438f29dc27d58/library/core/src/panicking.rs:76:14 2: libyml::scanner::yaml_parser_scan_flow_scalar 3: libyml::scanner::yaml_parser_fetch_flow_scalar 4: libyml::scanner::yaml_parser_fetch_next_token 5: libyml::scanner::yaml_parser_fetch_more_tokens 6: libyml::parser::peek_token 7: libyml::parser::yaml_parser_parse_block_mapping_value 8: libyml::parser::yaml_parser_state_machine 9: libyml::parser::yaml_parser_parse 10: serde_yml::libyml::parser::Parser::parse_next_event 11: serde_yml::loader::Loader::next_document 12: <serde_yml::de::Deserializer as core::iter::traits::iterator::Iterator>::next 13: <serde_yml::de::Deserializer as core::iter::traits::iterator::Iterator>::next 14: crd_v1_fetcher::parse_crds 15: crd_v1_fetcher::main 16: core::ops::function::FnOnce::call_once Signed-off-by: Sebastian Hoß <[email protected]>
Signed-off-by: Sebastian Hoß <[email protected]>
Signed-off-by: Sebastian Hoß <[email protected]>
Signed-off-by: Sebastian Hoß <[email protected]>
So in the past, this project had to deal with various size related issues, e.g. in the very first version I was running with all features enabled and quickly realized that my computer does not have enough memory. I solved this by running commands like
cargo check
in a loop against each feature individually instead of running against the entire project with all features enabled. This worked for a while, however once more and more custom resources were added and thus the number of features grew as well, we ran into a limit of crates.io which only allows 300 features per crate nowadays. Originally, each version in each custom resource group had its own feature, which was changed so that each group is mapped to one feature and each version in that group is one module. Additionally, I contacted the crates.io team to increase the feature limit for this project, so that I can continue to grow past 300 custom resource groups. Lately, we are running into the 10 MiB per crate size limit of crates.io. Again, I contacted the crates.io who again were very helpful and quickly increased the size limit. However, both the feature and size limits will be reached again eventually since the Kubernetes community is very active and creates more and more custom resources all the time.The idea here is to come up with a solution to allow this project to grow indefinitely. The rather simple idea in this PR is to split all custom resource groups into their own crate. Each version within a group continues to be a module within that crate. Since each group typically only contains a handful of resources, I do not see any size related problems happening in the future (although it's possible in theory..). Each crate will be versioned and released individually and the release schedules are spread across an entire day, so that we won't hammer the crates.io API with hundreds of crates at the same time.
Open questions:
kcr_
prefix which stands for kube-custom-resource). The crates.io team have asked me in the past to limit the number of releases since the high number of features in the original crate causes too much strain in their API.Feedback highly welcomed here. I'm pinging @w3irdrobot, @waynr, @banool, @brendanhay and @orf since you guys either pushed changes in this repo or somehow interacted with it in the past and I'm assuming you are using this project. Likewise, @clux since he created kopium and maintains kube-rs. I'm going to contact the crates.io team by mail since I don't not know any of their usernames on GitHub (feel free to ping them here if you do?)
Ignore the build failures - this is WIP. If you want to see what this currently looks like open https://github.com/metio/kube-custom-resources-rs/tree/split since the diff view in this PR will be completely useless 😅